Introduction

First of all, let’s examine the cars dataset. Display the first 6 rows.

##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

This is the entire dataset.

##    speed dist
## 1      4    2
## 2      4   10
## 3      7    4
## 4      7   22
## 5      8   16
## 6      9   10
## 7     10   18
## 8     10   26
## 9     10   34
## 10    11   17
## 11    11   28
## 12    12   14
## 13    12   20
## 14    12   24
## 15    12   28
## 16    13   26
## 17    13   34
## 18    13   34
## 19    13   46
## 20    14   26
## 21    14   36
## 22    14   60
## 23    14   80
## 24    15   20
## 25    15   26
## 26    15   54
## 27    16   32
## 28    16   40
## 29    17   32
## 30    17   40
## 31    17   50
## 32    18   42
## 33    18   56
## 34    18   76
## 35    18   84
## 36    19   36
## 37    19   46
## 38    19   68
## 39    20   32
## 40    20   48
## 41    20   52
## 42    20   56
## 43    20   64
## 44    22   66
## 45    23   54
## 46    24   70
## 47    24   92
## 48    24   93
## 49    24  120
## 50    25   85

LaTeX code

\(\frac{a+b}{c+d}\)

\[\lim\limits_{x \to \infty} \exp(-x) = 0\]

Exploratory Data Analysis

Let’s investigate the size and dimensionality of our dataset.

## [1] 50
## [1] 2

The size is 50 and the dimensionality is 2.

Variation within variables

Let’s investigate our dataset using a 5 number summary.

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Let’s visualise our distributions using boxplots. This is using base R graphics.

This is a ggplot alternative.

cars %>%
  ggplot(aes(y=speed)) +
  geom_boxplot()

  # geom_histogram(binwidth=5)

ggplotly()
cars %>% 
  ggplot(aes(y=dist)) +
  geom_boxplot()

ggplotly()

This is base R code for scatterplot.

plot(cars)

This is a ggplot scatterplot

xy_plot <- cars %>%
  ggplot(aes(x=speed, y=dist)) +
  geom_point() +
  theme_bw()

ggplotly(xy_plot)
round(cor(cars$speed,cars$dist),2)
## [1] 0.81